Improving Relative-Entropy Pruning using Statistical Significance

نویسندگان

Wang Ling

Nadi Tomeh

Guang Xiang

Isabel Trancoso

Alan W. Black

چکیده

Relative Entropy-based pruning has been shown to be efficient for pruning language models for more than a decade ago. Recently, this method has been applied to Phrase-based Machine Translation, and results suggest that this method is comparable the state-of-art pruning method based on significance tests. In this work, we show that these 2 methods are effective in pruning different types of phrase pairs. On one hand, relative entropy pruning searches for phrase pairs that can be composed using smaller constituents with a small or no loss in probability. On the other hand, significance pruning removes phrase pairs that are likely to be spurious. Then, we show that these methods can be combined in order to produce better results, over both metrics when used individually.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Entropy-based Pruning of Backoff Language Models

A criterion for pruning parameters from N-gram backoff language models is developed, based on the relative entropy between the original and the pruned model. It is shown that the relative entropy resulting from pruning a single N-gram can be computed exactly and efficiently for backoff models. The relative entropy measure can be expressed as a relative change in training set perplexity. This le...

متن کامل

Some properties of the parametric relative operator entropy

The notion of entropy was introduced by Clausius in 1850, and some of the main steps towards the consolidation of the concept were taken by Boltzmann and Gibbs. Since then several extensions and reformulations have been developed in various disciplines with motivations and applications in different subjects, such as statistical mechanics, information theory, and dynamical systems. Fujii and Kam...

متن کامل

Study on interaction between entropy pruning and kneser-ney smoothing

The paper presents an in-depth analysis of a less known interaction between Kneser-Ney smoothing and entropy pruning that leads to severe degradation in language model performance under aggressive pruning regimes. Experiments in a data-rich setup such as google.com voice search show a significant impact in WER as well: pruning Kneser-Ney and Katz models to 0.1% of their original impacts speech ...

متن کامل

On Different Facets of Regularization Theory

This review provides a comprehensive understanding of regularization theory from different perspectives, emphasizing smoothness and simplicity principles. Using the tools of operator theory and Fourier analysis, it is shown that the solution of the classical Tikhonov regularization problem can be derived from the regularized functional defined by a linear differential (integral) operator in the...

متن کامل

Relative Entropy Rate between a Markov Chain and Its Corresponding Hidden Markov Chain

 In this paper we study the relative entropy rate between a homogeneous Markov chain and a hidden Markov chain defined by observing the output of a discrete stochastic channel whose input is the finite state space homogeneous stationary Markov chain. For this purpose, we obtain the relative entropy between two finite subsequences of above mentioned chains with the help of the definition of...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2012

Improving Relative-Entropy Pruning using Statistical Significance

نویسندگان

چکیده

منابع مشابه

Entropy-based Pruning of Backoff Language Models

Some properties of the parametric relative operator entropy

Study on interaction between entropy pruning and kneser-ney smoothing

On Different Facets of Regularization Theory

Relative Entropy Rate between a Markov Chain and Its Corresponding Hidden Markov Chain

عنوان ژورنال:

اشتراک گذاری